412 research outputs found
10 Security and Privacy Problems in Self-Supervised Learning
Self-supervised learning has achieved revolutionary progress in the past
several years and is commonly believed to be a promising approach for
general-purpose AI. In particular, self-supervised learning aims to pre-train
an encoder using a large amount of unlabeled data. The pre-trained encoder is
like an "operating system" of the AI ecosystem. Specifically, the encoder can
be used as a feature extractor for many downstream tasks with little or no
labeled training data. Existing studies on self-supervised learning mainly
focused on pre-training a better encoder to improve its performance on
downstream tasks in non-adversarial settings, leaving its security and privacy
in adversarial settings largely unexplored. A security or privacy issue of a
pre-trained encoder leads to a single point of failure for the AI ecosystem. In
this book chapter, we discuss 10 basic security and privacy problems for the
pre-trained encoders in self-supervised learning, including six confidentiality
problems, three integrity problems, and one availability problem. For each
problem, we discuss potential opportunities and challenges. We hope our book
chapter will inspire future research on the security and privacy of
self-supervised learning.Comment: A book chapte
Breaking Free from Fusion Rule: A Fully Semantic-driven Infrared and Visible Image Fusion
Infrared and visible image fusion plays a vital role in the field of computer
vision. Previous approaches make efforts to design various fusion rules in the
loss functions. However, these experimental designed fusion rules make the
methods more and more complex. Besides, most of them only focus on boosting the
visual effects, thus showing unsatisfactory performance for the follow-up
high-level vision tasks. To address these challenges, in this letter, we
develop a semantic-level fusion network to sufficiently utilize the semantic
guidance, emancipating the experimental designed fusion rules. In addition, to
achieve a better semantic understanding of the feature fusion process, a fusion
block based on the transformer is presented in a multi-scale manner. Moreover,
we devise a regularization loss function, together with a training strategy, to
fully use semantic guidance from the high-level vision tasks. Compared with
state-of-the-art methods, our method does not depend on the hand-crafted fusion
loss function. Still, it achieves superior performance on visual quality along
with the follow-up high-level vision tasks
Optimal covariance matrix estimation for high-dimensional noise in high-frequency data
In this paper, we consider efficiently learning the structural information
from the highdimensional noise in high-frequency data via estimating its
covariance matrix with optimality. The problem is uniquely challenging due to
the latency of the targeted high-dimensional vector containing the noises, and
the practical reality that the observed data can be highly asynchronous -- not
all components of the high-dimensional vector are observed at the same time
points. To meet the challenges, we propose a new covariance matrix estimator
with appropriate localization and thresholding. In the setting with latency and
asynchronous observations, we establish the minimax optimal convergence rates
associated with two commonly used loss functions for the covariance matrix
estimations. As a major theoretical development, we show that despite the
latency of the signal in the high-frequency data, the optimal rates remain the
same as if the targeted high-dimensional noises are directly observable. Our
results indicate that the optimal rates reflect the impact due to the
asynchronous observations, which are slower than that with synchronous
observations. Furthermore, we demonstrate that the proposed localized estimator
with thresholding achieves the minimax optimal convergence rates. We also
illustrate the empirical performance of the proposed estimator with extensive
simulation studies and a real data analysis
StolenEncoder: Stealing Pre-trained Encoders in Self-supervised Learning
Pre-trained encoders are general-purpose feature extractors that can be used
for many downstream tasks. Recent progress in self-supervised learning can
pre-train highly effective encoders using a large volume of unlabeled data,
leading to the emerging encoder as a service (EaaS). A pre-trained encoder may
be deemed confidential because its training requires lots of data and
computation resources as well as its public release may facilitate misuse of
AI, e.g., for deepfakes generation. In this paper, we propose the first attack
called StolenEncoder to steal pre-trained image encoders. We evaluate
StolenEncoder on multiple target encoders pre-trained by ourselves and three
real-world target encoders including the ImageNet encoder pre-trained by
Google, CLIP encoder pre-trained by OpenAI, and Clarifai's General Embedding
encoder deployed as a paid EaaS. Our results show that our stolen encoders have
similar functionality with the target encoders. In particular, the downstream
classifiers built upon a target encoder and a stolen one have similar accuracy.
Moreover, stealing a target encoder using StolenEncoder requires much less data
and computation resources than pre-training it from scratch. We also explore
three defenses that perturb feature vectors produced by a target encoder. Our
results show these defenses are not enough to mitigate StolenEncoder.Comment: To appear in ACM Conference on Computer and Communications Security
(CCS), 202
Dual Adversarial Resilience for Collaborating Robust Underwater Image Enhancement and Perception
Due to the uneven scattering and absorption of different light wavelengths in
aquatic environments, underwater images suffer from low visibility and clear
color deviations. With the advancement of autonomous underwater vehicles,
extensive research has been conducted on learning-based underwater enhancement
algorithms. These works can generate visually pleasing enhanced images and
mitigate the adverse effects of degraded images on subsequent perception tasks.
However, learning-based methods are susceptible to the inherent fragility of
adversarial attacks, causing significant disruption in results. In this work,
we introduce a collaborative adversarial resilience network, dubbed CARNet, for
underwater image enhancement and subsequent detection tasks. Concretely, we
first introduce an invertible network with strong perturbation-perceptual
abilities to isolate attacks from underwater images, preventing interference
with image enhancement and perceptual tasks. Furthermore, we propose a
synchronized attack training strategy with both visual-driven and
perception-driven attacks enabling the network to discern and remove various
types of attacks. Additionally, we incorporate an attack pattern discriminator
to heighten the robustness of the network against different attacks. Extensive
experiments demonstrate that the proposed method outputs visually appealing
enhancement images and perform averagely 6.71% higher detection mAP than
state-of-the-art methods.Comment: 9 pages, 9 figure
AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware Robust Adversarial Training
Monocular 3D object detection plays a pivotal role in the field of autonomous
driving and numerous deep learning-based methods have made significant
breakthroughs in this area. Despite the advancements in detection accuracy and
efficiency, these models tend to fail when faced with such attacks, rendering
them ineffective. Therefore, bolstering the adversarial robustness of 3D
detection models has become a crucial issue that demands immediate attention
and innovative solutions. To mitigate this issue, we propose a depth-aware
robust adversarial training method for monocular 3D object detection, dubbed
DART3D. Specifically, we first design an adversarial attack that iteratively
degrades the 2D and 3D perception capabilities of 3D object detection
models(IDP), serves as the foundation for our subsequent defense mechanism. In
response to this attack, we propose an uncertainty-based residual learning
method for adversarial training. Our adversarial training approach capitalizes
on the inherent uncertainty, enabling the model to significantly improve its
robustness against adversarial attacks. We conducted extensive experiments on
the KITTI 3D datasets, demonstrating that DART3D surpasses direct adversarial
training (the most popular approach) under attacks in 3D object detection
of car category for the Easy, Moderate, and Hard settings, with
improvements of 4.415%, 4.112%, and 3.195%, respectively
Improving Misaligned Multi-modality Image Fusion with One-stage Progressive Dense Registration
Misalignments between multi-modality images pose challenges in image fusion,
manifesting as structural distortions and edge ghosts. Existing efforts
commonly resort to registering first and fusing later, typically employing two
cascaded stages for registration,i.e., coarse registration and fine
registration. Both stages directly estimate the respective target deformation
fields. In this paper, we argue that the separated two-stage registration is
not compact, and the direct estimation of the target deformation fields is not
accurate enough. To address these challenges, we propose a Cross-modality
Multi-scale Progressive Dense Registration (C-MPDR) scheme, which accomplishes
the coarse-to-fine registration exclusively using a one-stage optimization,
thus improving the fusion performance of misaligned multi-modality images.
Specifically, two pivotal components are involved, a dense Deformation Field
Fusion (DFF) module and a Progressive Feature Fine (PFF) module. The DFF
aggregates the predicted multi-scale deformation sub-fields at the current
scale, while the PFF progressively refines the remaining misaligned features.
Both work together to accurately estimate the final deformation fields. In
addition, we develop a Transformer-Conv-based Fusion (TCF) subnetwork that
considers local and long-range feature dependencies, allowing us to capture
more informative features from the registered infrared and visible images for
the generation of high-quality fused images. Extensive experimental analysis
demonstrates the superiority of the proposed method in the fusion of misaligned
cross-modality images
WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond
Underwater images suffer from light refraction and absorption, which impairs
visibility and interferes the subsequent applications. Existing underwater
image enhancement methods mainly focus on image quality improvement, ignoring
the effect on practice. To balance the visual quality and application, we
propose a heuristic normalizing flow for detection-driven underwater image
enhancement, dubbed WaterFlow. Specifically, we first develop an invertible
mapping to achieve the translation between the degraded image and its clear
counterpart. Considering the differentiability and interpretability, we
incorporate the heuristic prior into the data-driven mapping procedure, where
the ambient light and medium transmission coefficient benefit credible
generation. Furthermore, we introduce a detection perception module to transmit
the implicit semantic guidance into the enhancement procedure, where the
enhanced images hold more detection-favorable features and are able to promote
the detection performance. Extensive experiments prove the superiority of our
WaterFlow, against state-of-the-art methods quantitatively and qualitatively.Comment: 10 pages, 13 figure
- …